504 research outputs found

    Parameterized Complexity of the k-anonymity Problem

    Full text link
    The problem of publishing personal data without giving up privacy is becoming increasingly important. An interesting formalization that has been recently proposed is the kk-anonymity. This approach requires that the rows of a table are partitioned in clusters of size at least kk and that all the rows in a cluster become the same tuple, after the suppression of some entries. The natural optimization problem, where the goal is to minimize the number of suppressed entries, is known to be APX-hard even when the records values are over a binary alphabet and k=3k=3, and when the records have length at most 8 and k=4k=4 . In this paper we study how the complexity of the problem is influenced by different parameters. In this paper we follow this direction of research, first showing that the problem is W[1]-hard when parameterized by the size of the solution (and the value kk). Then we exhibit a fixed parameter algorithm, when the problem is parameterized by the size of the alphabet and the number of columns. Finally, we investigate the computational (and approximation) complexity of the kk-anonymity problem, when restricting the instance to records having length bounded by 3 and k=3k=3. We show that such a restriction is APX-hard.Comment: 22 pages, 2 figure

    Mariages mixtes, migration féminine et travail domestique: un regard sur la situation italienne

    Get PDF
    The article stimulates a reflection on the theme of mixed marriages in Italy, with special reference to the so-called \u201ccaregivers\u2019 marriages\u201d. In recent years, these have been subject to an increasing stigmatization in the Italian public debate, contributing to legitimize some recent reforms in the national pension system. Drawing on the narrative of a young domestic worker married to an older Italian man, the article calls for a more complex and multifaceted vision of these processes, which have so far received little attention from the Italian and international research in the field.L\u2019article propose une r\ue9flexion sur le th\ue8me des mariages mixtes en Italie, et en particulier desdits \uab mariages des assistantes de vie \ue9trang\ue8res \ue0 domicile \ubb. Ceux-ci, en effet, ces derni\ue8res ann\ue9es, ont fait l\u2019objet de lectures posant probl\ue8me dans le d\ue9bat public italien, contribuant \ue0 l\ue9gitimer certaines r\ue9formes r\ue9centes du syst\ue8me des retraites. Prenant comme base des points qui \ue9mergent de l\u2019exp\ue9rience directe d\u2019une jeune employ\ue9e de maison \ue9trang\ue8re, mari\ue9e \ue0 un homme italien plus \ue2g\ue9, l\u2019article vise \ue0 livrer une vision plus complexe de ces processus qui ont encore peu suscit\ue9, dans l\u2019ensemble, l\u2019attention des chercheurs italiens et internationaux sp\ue9cialis\ue9s

    Covering Pairs in Directed Acyclic Graphs

    Full text link
    The Minimum Path Cover problem on directed acyclic graphs (DAGs) is a classical problem that provides a clear and simple mathematical formulation for several applications in different areas and that has an efficient algorithmic solution. In this paper, we study the computational complexity of two constrained variants of Minimum Path Cover motivated by the recent introduction of next-generation sequencing technologies in bioinformatics. The first problem (MinPCRP), given a DAG and a set of pairs of vertices, asks for a minimum cardinality set of paths "covering" all the vertices such that both vertices of each pair belong to the same path. For this problem, we show that, while it is NP-hard to compute if there exists a solution consisting of at most three paths, it is possible to decide in polynomial time whether a solution consisting of at most two paths exists. The second problem (MaxRPSP), given a DAG and a set of pairs of vertices, asks for a path containing the maximum number of the given pairs of vertices. We show its NP-hardness and also its W[1]-hardness when parametrized by the number of covered pairs. On the positive side, we give a fixed-parameter algorithm when the parameter is the maximum overlapping degree, a natural parameter in the bioinformatics applications of the problem

    On the Complexity of tt-Closeness Anonymization and Related Problems

    Full text link
    An important issue in releasing individual data is to protect the sensitive information from being leaked and maliciously utilized. Famous privacy preserving principles that aim to ensure both data privacy and data integrity, such as kk-anonymity and ll-diversity, have been extensively studied both theoretically and empirically. Nonetheless, these widely-adopted principles are still insufficient to prevent attribute disclosure if the attacker has partial knowledge about the overall sensitive data distribution. The tt-closeness principle has been proposed to fix this, which also has the benefit of supporting numerical sensitive attributes. However, in contrast to kk-anonymity and ll-diversity, the theoretical aspect of tt-closeness has not been well investigated. We initiate the first systematic theoretical study on the tt-closeness principle under the commonly-used attribute suppression model. We prove that for every constant tt such that 0≤t<10\leq t<1, it is NP-hard to find an optimal tt-closeness generalization of a given table. The proof consists of several reductions each of which works for different values of tt, which together cover the full range. To complement this negative result, we also provide exact and fixed-parameter algorithms. Finally, we answer some open questions regarding the complexity of kk-anonymity and ll-diversity left in the literature.Comment: An extended abstract to appear in DASFAA 201

    The zero exemplar distance problem

    Full text link
    Given two genomes with duplicate genes, \textsc{Zero Exemplar Distance} is the problem of deciding whether the two genomes can be reduced to the same genome without duplicate genes by deleting all but one copy of each gene in each genome. Blin, Fertin, Sikora, and Vialette recently proved that \textsc{Zero Exemplar Distance} for monochromosomal genomes is NP-hard even if each gene appears at most two times in each genome, thereby settling an important open question on genome rearrangement in the exemplar model. In this paper, we give a very simple alternative proof of this result. We also study the problem \textsc{Zero Exemplar Distance} for multichromosomal genomes without gene order, and prove the analogous result that it is also NP-hard even if each gene appears at most two times in each genome. For the positive direction, we show that both variants of \textsc{Zero Exemplar Distance} admit polynomial-time algorithms if each gene appears exactly once in one genome and at least once in the other genome. In addition, we present a polynomial-time algorithm for the related problem \textsc{Exemplar Longest Common Subsequence} in the special case that each mandatory symbol appears exactly once in one input sequence and at least once in the other input sequence. This answers an open question of Bonizzoni et al. We also show that \textsc{Zero Exemplar Distance} for multichromosomal genomes without gene order is fixed-parameter tractable if the parameter is the maximum number of chromosomes in each genome.Comment: Strengthened and reorganize

    Linear splicing and syntactic monoid

    Get PDF
    AbstractSplicing systems were introduced by Head in 1987 as a formal counterpart of a biological mechanism of DNA recombination under the action of restriction and ligase enzymes. Despite the intensive studies on linear splicing systems, some elementary questions about their computational power are still open. In particular, in this paper we face the problem of characterizing the proper subclass of regular languages which are generated by finite (Paun) linear splicing systems. We introduce here the class of marker languages L, i.e., regular languages with the form L=L1[x]1L2, where L1,L2 are regular languages, [x] is a syntactic congruence class satisfying special conditions and [x]1 is either equal to [x] or equal to [x]∪{1}, 1 being the empty word. Using classical properties of formal language theory, we give an algorithm which allows us to decide whether a regular language is a marker language. Furthermore, for each marker language L we exhibit a finite Paun linear splicing system and we prove that this system generates L

    Approximating Clustering of Fingerprint Vectors with Missing Values

    Full text link
    The problem of clustering fingerprint vectors is an interesting problem in Computational Biology that has been proposed in (Figureroa et al. 2004). In this paper we show some improvements in closing the gaps between the known lower bounds and upper bounds on the approximability of some variants of the biological problem. Namely we are able to prove that the problem is APX-hard even when each fingerprint contains only two unknown position. Moreover we have studied some variants of the orginal problem, and we give two 2-approximation algorithm for the IECMV and OECMV problems when the number of unknown entries for each vector is at most a constant.Comment: 13 pages, 4 figure

    MALVA: Genotyping by Mapping-free ALlele Detection of Known VAriants

    Get PDF
    The amount of genetic variation discovered in human populations is growing rapidly leading to challenging computational tasks, such as variant calling. Standard methods for addressing this problem include read mapping, a computationally expensive procedure; thus, mapping-free tools have been proposed in recent years. These tools focus on isolated, biallelic SNPs, providing limited support for multi-allelic SNPs and short insertions and deletions of nucleotides (indels). Here we introduce MALVA, a mapping-free method to genotype an individual from a sample of reads. MALVA is the first mapping-free tool able to genotype multi-allelic SNPs and indels, even in high-density genomic regions, and to effectively handle a huge number of variants. MALVA requires one order of magnitude less time to genotype a donor than alignment-based pipelines, providing similar accuracy. Remarkably, on indels, MALVA provides even better results than the most widely adopted variant discovery tools. Biological Sciences; Genetics; Genomics; Bioinformatic

    Insights into an unexplored component of the mosquito repeatome: Distribution and variability of viral sequences integrated into the genome of the arboviral vector aedes albopictus

    Get PDF
    The Asian tiger mosquito Aedes albopictus is an invasive mosquito and a competent vector for public-health relevant arboviruses such as Chikungunya (Alphavirus), Dengue and Zika (Flavivirus) viruses. Unexpectedly, the sequencing of the genome of this mosquito revealed an unusually high number of integrated sequences with similarities to non-retroviral RNA viruses of the Flavivirus and Rhabdovirus genera. These Non-retroviral Integrated RNA Virus Sequences (NIRVS) are enriched in piRNA clusters and coding sequences and have been proposed to constitute novel mosquito immune factors. However, given the abundance of NIRVS and their variable viral origin, their relative biological roles remain unexplored. Here we used an analytical approach that intersects computational, evolutionary and molecular methods to study the genomic landscape of mosquito NIRVS. We demonstrate that NIRVS are differentially distributed across mosquito genomes, with a core set of seemingly the oldest integrations with similarity to Rhabdoviruses. Additionally, we compare the polymorphisms of NIRVS with respect to that of fast and slow-evolving genes within the Ae. albopictus genome. Overall, NIRVS appear to be less polymorphic than slow-evolving genes, with differences depending on whether they occur in intergenic regions or in piRNA clusters. Finally, two NIRVS that map within the coding sequences of genes annotated as Rhabdovirus RNA-dependent RNA polymerase and the nucleocapsid-encoding gene, respectively, are highly polymorphic and are expressed, suggesting exaptation possibly to enhance the mosquito's antiviral responses. These results greatly advance our understanding of the complexity of the mosquito repeatome and the biology of viral integrations in mosquito genomes

    ASPicDB: a database of annotated transcript and protein variants generated by alternative splicing

    Get PDF
    Alternative splicing is emerging as a major mechanism for the expansion of the transcriptome and proteome diversity, particularly in human and other vertebrates. However, the proportion of alternative transcripts and proteins actually endowed with functional activity is currently highly debated. We present here a new release of ASPicDB which now provides a unique annotation resource of human protein variants generated by alternative splicing. A total of 256 939 protein variants from 17 191 multi-exon genes have been extensively annotated through state of the art machine learning tools providing information of the protein type (globular and transmembrane), localization, presence of PFAM domains, signal peptides, GPI-anchor propeptides, transmembrane and coiled-coil segments. Furthermore, full-length variants can be now specifically selected based on the annotation of CAGE-tags and polyA signal and/or polyA sites, marking transcription initiation and termination sites, respectively. The retrieval can be carried out at gene, transcript, exon, protein or splice site level allowing the selection of data sets fulfilling one or more features settled by the user. The retrieval interface also enables the selection of protein variants showing specific differences in the annotated features. ASPicDB is available at http://www.caspur.it/ASPicDB/
    • …
    corecore